Language/OS - Multiplatform Resource Library

home *** CD-ROM | disk | FTP | other *** search

/ Language/OS - Multiplatform Resource Library / LANGUAGE OS.iso / scheme / siod / siod_v20.lha / siod.doc < prev next >

Wrap

Text File | 1993-08-16 | 15.1 KB | 441 lines

* COPYRIGHT (c) 1989 BY * * PARADIGM ASSOCIATES INCORPORATED, CAMBRIDGE, MASSACHUSETTS. * * See the source file SLIB.C for more information. * Documentation for Release 2.0 1-DEC-89, George Carrette [Release Notes:] 1.4 This release is functionally the same as release 1.3 but has been remodularized in response to people who have been encorporating SIOD as an interpreted extension language in other systems. 1.5 Added the -g flag to enable mark-and-sweep garbage collection. The default is stop-and-copy. 2.0 Set_Repl_Hooks, catch & throw. gjc@paradigm.com George Carrette Paradigm Associates Inc Phone: 617-492-6079 29 Putnam Ave, Suite 6 Cambridge, MA 02138 [Files:] siod.h Declarations slib.c scheme library. siod.c a main program. siod.scm (optional) Some scheme code [Motivation:] The most obvious thing one should notice is that this lisp implementation is extremely small. For example, the resulting binary executable file on a VAX/VMS system with /notraceback/nodebug is 17 kilo-bytes. Small enough to understand, the source file slib.c is 30 kilo-bytes. Small enough to include in the smallest applications which require command interpreters or extension languages. We also want to be able to run code from the book "Structure and Interpretation of Computer Programs." Techniques used will be familiar to most lisp implementors. Having objects be all the same size, and having only two statically allocated spaces simplifies and speeds up both consing and gc considerably. the MSUBR hack allows for a modular implementation of tail recursion, an extension of the FSUBR that is, as far as I know, original. The optional mark and sweep garbage collector may be selected at runtime. Error handling is rather crude. A topic taken with machine fault, exception handling, tracing, debugging, and state recovery which we could cover in detail, but is clearly beyond the scope of this implementation. Suffice it to say that if you have a good symbolic debugger you can set a break point at "err" and observe in detail all the arguments and local variables of the procedures in question, since there is no casting of data types. For example, if X is an offending or interesting object then examining X->type will give you the type, and X->storage_as.cons will show the car and the cdr. [Garbage Collection:] There are two storage management techniques which may be chosen at runtime by specifying the -g argument flag. -g1 (the default) is stop-and-copy. This is the simplest and most portable implementation. GC is only done at toplevel. -g0 is mark-and-sweep. GC is done at any time, required or requested. However, the implementation is not as portable. Discussion of stop-and-copy follows: As one can see from the source, garbage collection is really quite an easy thing. The procedure gc_relocate is about 25 lines of code, and scan_newspace is about 15. The real tricks in handling garbage collection are (in a copying gc): (1) keeping track of locations containing objects (2) parsing the heap (in the space scanning) The procedure gc_protect is called once (e.g. at startup) on each "global" location which will contain a lisp object. That leaves the stack. Now, if we had chosen not to use the argument and return-value passing mechanism provided by the C-language implementation, (also known as the "machine stack" and "machine procedure calling mechanism) this lisp would be larger, slower, and rather more difficult to read and understand. Furthermore it would be considerably more painful to *add* functionality in the way of SUBR's to the implementation. Aside from writing a very machine and compiler specific assembling language routine for each C-language implementation, embodying assumptions about the placement choices for arguments and local values, etc, we are left with the following limitation: "YOU CAN GC ONLY AT TOP-LEVEL" However, this fits in perfectly with the programming style imposed in many user interface implementations including the MIT X-Windows Toolkit. In the X Toolkit, a callback or work procedure is not supposed to spend much time implementing the action. Therefore it cannot have allocated much storage, and the callback trampoline mechanism can post a work procedure to call the garbage collector when needed. Our simple object format makes parsing the heap rather trivial. In more complex situations one ends up requiring object headers or markers of some kind to keep track of the actual storage lengths of objects and what components of objects are lisp pointers. Discussion of mark-and-sweep follows: In a mark-and-sweep GC the objects are not relocated. Instead one only has to LOOK at objects which are referenced by the argument frames and local variables of the underlying (in this case C-coded) implementation procedures. If a pointer "LOOKS" like it is a valid lisp object (see the procedure mark_locations_array) then it may be marked, and the objects it points to may be marked, as being in-use storage which is not linked into the freelist in the gc_sweep phase. Another advantage of the mark_and_sweep storage management technique is that only one heap is required. This main disadvantages are: (1) start-up cost to initially link freelist. (can be avoided by more general but slower NEWCELL code). (2) does not COMPACT or LOCALIZE the use of storage. This is POOR engineering practice in a virtual memory environment. (2) the entire heap must be looked at, not just the parts with useful storage. In general, mark-and-sweep is slower in that it has to look at more memory locations for a given heap size, however the heap size can be smaller for a given problem being solved. More complex analysis is required when READ-ONLY, STATIC, storage spaces are considered. Additionally the most sophisticated stop-and-copy storage management techniques take into account considerations of object usage temporality. [Compilation:] The code has been compiled and run by the author on Sun III and IV, Encore Multimax, 4.3BSD VAX, VAX/VMS, and AMIGA 500 using the Lattice C compiler. On all unix machines use (with floating-point flags as needed) %cc -O -c siod.c %cc -O -c slib.c %cc -o siod siod.o slib.o on VAX/VMS: $ cc siod $ cc slib $ link siod,slib,sys$input:/opt sys$library:vaxcrtl/share $ siod == "$" + F$ENV("DEFAULT") + "SIOD" on AMIGA 500, ignore warning messages about return value mismatches, %lc siod.c %lc slib.c %blink lib:c.o,siod.o,slib.o to siod lib lib:lcm.lib,lib:lc.lib,lib:amiga.lib [Invocation:] siod [-hXXXXX] [-iXXXXX] [-gX] -h where XXXXX is an integer, to specify the heap size, in obj cells, -i where XXXXX is a filename to load before going into the repl loop. -g where X = 1 for stop-and-copy GC, X = 0 for mark-and-sweep. Example: siod -isiod.scm -h100000 [System:] The interrupts called SIGINT and SIGFPE by the C runtime system are handled by invoking the lisp error procedure. SIGINT is usually caused by the CONTROL-C character and SIGFPE by floating point overflow or underflow. [Syntax:] The only special characters are the parenthesis and single quote. Everything else, besides whitespace of course, will make up a regular token. These tokens are either symbols or numbers depending on what they look like. Dotted-list notation is not supported on input, only on output. [Special forms:] The CAR of a list is evaluated first, if the value is a SUBR of type 9 or 10 then it is a special form. (define symbol value) is presently like (set! symbol value). (define (f . arglist) . body) ==> (define f (lambda arglist . body)) (lambda arglist . body) Returns a closure. (if pred val1 val2) If pred evaluates to () then val2 is evaluated else val1. (begin . body) Each form in body is evaluated with the result of the last returned. (set! symbol value) Evaluates value and sets the local or global value of the symbol. (or x1 x2 x3 ...) Returns the first Xn such that Xn evaluated non-(). (and x1 x2 x3 ...) Keeps evaluating Xj until one returns (), or Xn. (quote form). Input syntax 'form, returns form without evaluation. (let pairlist . body) Each element in pairlist is (variable value). Evaluates each value then sets of new bindings for each of the variables, then evaluates the body like the body of a progn. This is actually implemented as a macro turning into a let-internal form. (the-environment) Returns the current lexical environment. [Macro Special forms:] If the CAR of a list evaluates to a symbol then the value of that symbol is called on a single argument, the original form. The result of this application is a new form which is recursively evaluated. [Built-In functions:] These are all SUBR's of type 4,5,6,7, taking from 0 to 3 arguments with extra arguments ignored, (not even evaluated!) and arguments not given defaulting to (). SUBR's of type 8 are lexprs, receiving a list of arguments. Order of evaluation of arguments will depend on the implementation choice of your system C compiler. consp cons car cdr set-car! set-cdr! number? + - * / < > eqv? The arithmetic functions all take two arguments. eq?, pointer objective identity. (Use eqv? for numbers.) symbolconc, takes symbols as arguments and appends them. symbol? symbol-bound? takes an optional environment structure. symbol-value also takes optional env. set-symbol-value also takes optional env. env-lookup takes a symbol and an environment structure. If it returns non-nil the CAR will be the value of the symbol. assq read,print eval, takes a second argument, an environment. copy-list. Copies the top level conses in a list. oblist, returns a copy of the list of the symbols that have been interned. gc-status, prints out the status of garbage collection services, the number of cells allocated and the number of cells free. If given a () argument turns gc services off, if non-() turns gc services on. In mark-and-sweep storage management mode the argument only turns on and off verbosity of GC messages. gc, does a mark-and-sweep garbage collection. If called with argument nil does not print gc messages during the gc. load, given a filename (which must be a symbol, there are no strings) will read/eval all the forms in that file. quit, will exit back to the operating system. error, takes a symbol as its first argument, prints the pname of this as an error message. The second argument (optional) is an offensive object. The global variable errobj gets set to this object for later observation. null?, not. are the same thing. *catch tag exp, Sets up a dynamic catch frame using tag. Then evaluates exp. *throw tag value, finds the nearest *catch with an EQ tag, and cause it to return value. [Procedures in main program siod.c] edit is a VMS specific function that takes a single filename argument and calls the EDT editor (as a procedure) to edit the file. cfib is the same as standard-fib. You can time it and compare it with standard-fib to get an idea of the overhead of interpretation. [Utility procedures in siod.scm:] Shows how to define macros. cadr,caddr,cdddr,replace,list. (defvar variable default-value) And for us old maclisp hackers, setq and defun, and progn, etc. call-with-current-continuation Implemented in terms of *catch and *throw. So upward continuations are not allowed. [A streams implementation:] The first thing we must do is decide how to represent a stream. There is only one reasonable data structure available to us, the list. So we might use (<stream-car> <cache-flag> <cdr-cache> <cdr-procedure>) the-empty-stream is just (). empty-stream? head tail cons-stream is a special form. Wraps a lambda around the second argument. *cons-stream is the low-level constructor used by cons-stream. [Benchmarks:] A standard-fib procedure is included in siod.scm so that everyone will use the same definition in any reports of speed. Make sure the return result is correct. use command line argument of %siod -h100000 -isiod.scm (standard-fib 10) => 55 ; 795 cons work. (standard-fib 15) => 610 ; 8877 cons work. (standard-fib 20) => 6765 ; 98508 cons work. [Porting:] The only code under #ifdef is the definition of myruntime, which should be defined to return a double float, the number of cpu seconds used by the process so far. It uses the the tms_utime slot, and assumes a clock cycle of 1/60'th of a second. The stack and register marking code used in the mark-and-sweep GC is unlikely to work on machines that do not keep the procedure call stack in main memory at all times. It is assumed that setjmp saves all registers into the jmp_buff data structure. There is a bit of type casting in close_open_files and vload. The pname of an un-interned symbol is used as a pointer to FILE. This saves the code (a conser, a print case, and two gc cases) of defining a new data type for keeping track of binary data. If the stack is not always aligned (in LISP-PTR sense) then the gc_mark_and_sweep procedure will not work properly. Example, assuming a byte addressed 32-bit pointer machine: stack_start_ptr: [LISP-PTR(4)] [LISP-PTR(4)] [RANDOM(4)] [RANDOM(2)] [LISP-PTR(4)] [LISP-PTR(4)] [RANDOM(2)] [LISP-PTR(4)] [LISP-PTR(4)] stack_end: [LISP-PTR(4)] As mark_locations goes from start to end it will get off proper alignment somewhere in the middle, and therefore the stack marking operation will not properly identify some valid lisp pointers. Fortunately there is an easy fix to this. A more aggressive use of our mark_locations procedure will suffice. For example, say that there might be 2-byte random objects inserted into the stack. Then use two calls to mark_locations: mark_locations(((char *)stack_start_ptr) + 0,((char *)&stack_end) + 0); mark_locations(((char *)stack_start_ptr) + 2,((char *)&stack_end) + 2); If we think there might be 1-byte random objects, then 4 calls are required: mark_locations(((char *)stack_start_ptr) + 0,((char *)&stack_end) + 0); mark_locations(((char *)stack_start_ptr) + 1,((char *)&stack_end) + 1); mark_locations(((char *)stack_start_ptr) + 2,((char *)&stack_end) + 2); mark_locations(((char *)stack_start_ptr) + 3,((char *)&stack_end) + 3); [Interface to other programs:] If your main program does not want to actually have a read/eval/print loop, and instead wants to do something else entirely, then use the routine set_repl_hooks to set up for procedures for: * putting the prompt "> " and other info strings to standard output. * reading (getting) an expression * evaluating an expression * printing an expression. The routine get_eof_val may be called inside your reading procedure to return a value that will cause exit from the read/eval/print loop. In order to call a single C function in the context of the repl loop, you can do the following: int flag = 0; void my_puts(st) char *st; {} LISP my_reader() {if (flag == 1) return(get_eof_val()); flag == 1; return(NIL);} LISP my_eval(x) LISP x; {call_my_c_function(); return(NIL);} LISP my_print(x) LISP x; {} do_my_c_function() {set_repl_hooks(my_puts,my_read,my_eval,my_print); repl_driver(1, /* or 0 if we do not want lisp's SIGINT handler */ 0);}